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Abstract 

The aim of this paper is to provide an overview of recent development related to 
Bregman distances outside its native areas of optimization and statistics. We discuss 
approaches in inverse problems and image processing based on Bregman distances, which 
have evolved to a standard tool in these fields in the last decade. Moreover, we discuss 
related issues in the analysis and numerical analysis of nonlinear partial differential equa¬ 
tions with a variational structure. For such problems Bregman distances appear to be 
of similar importance, but are currently used only in a quite hidden fashion. We try to 
work out explicitely the aspects related to Bregman distances, which also lead to novel 
mathematical questions and may also stimulate further research in these areas. 
Keywords: Bregman Distances, Convexity, Duality, Error Estimates, Nonlinear Evo¬ 
lution Equations, Variational Regularization, Gradient Systems 


1 Introduction 

Bregman distances for (differentiable) convex functionals, originally introduced in the study 
of proximal algorithms in [12] and named in [25] , are a well established concept in continuous 
and discrete optimization in finite dimension. A classical example is the celebrated Bregman 
projection algorithm for finding points in the intersection of affine subspaces (cf. e.g. [27] i. We 
refer to ESIEZI for introductory and exhaustive views on Bregman distances in optimization. 

Although convex functionals play a role in many other branches of mathematics, e.g. 
in many variational problems and partial differential equations, the suitability of Bregman 
distances in such fields was hardly investigated for several descades. In mathematical imaging 
and inverse problems the situation changed with the rediscovery and further development of 
Bregman iterations as an iterative image restoration technique in the case of frequently used 
regularization techniques such as total variation (cf. [52]), which led to significantly improved 
results compared to standard variational models and could eliminate systematic errors to a 
certain extent (cf. [121 [21] )• Another key observation increasing the interest in Bregman 
distances in these fields was that they can be employed for error estimation in particular for 
not strictly convex and nonsmooth functionals (cf. [22]), which prevent norm estimates. 

Although there are many obvious links to the main route of research in Bregman distances 
and related optimization algorithms, there are several peculiar aspects that deserve particular 
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discussion. Besides missing smoothness of the considered functionals and the fact that prob¬ 
lems in imaging, inverse problems and partial differential equations are naturally formulated 
in infinite-dimensional Banach spaces such as the space of functions of bounded variation 
or Sobolev spaces, which have only been considered in few instances before, a key point is 
that the motivation for using Bregman distances in these fields often differs significantly from 
those in optimization and statistics. In the following we want to provide an overview of such 
questions and consequent developments, keeping an eye on potential directions and questions 
for future research. We start with a section including definitions, examples and some general 
properties of Bregman distances, before we survey aspects of Bregman distances in inverse 
problems and imaging developed in the last decade. Then we proceed to a discussion of Breg¬ 
man distances in partial differential equations, which is less explicit and hence the main goal 
is to highlight hidden use of Bregman distances and make the idea more directly accessible 
for future research. Finally we conclude with a section on related recent developments. 

2 Bregman Distances and their Basic Properties 

We start with a definition of a Bregman distance. In the remainder of this paper, let X be a 
Banach space and J : X —>■ M U {-|-oo} be convex functionals. We first recall the definition of 
subdifferential respectively subgradients. 

Definition 2.1. The subdifferential of a convex functional J is defined by 

dJ{u) = {p € X* I J{u) + lj),v — u) < J{v) for all v G X}. (2.1) 

An element p G dJ{u) is called subgradient. 

Having defined a subdifferential we can proceed to the definition of Bregman distances, 
respectively generalized Bregman distances according to [l6] 

Definition 2.2. The (generalized) Bregman distance related to a convex functional J with 
subgradient p is defined by 

D^j{v,u) = J{v) - J{u) - {p,v - u), (2.2) 

where p G dJ{u). The symmetric Bregman distance is defined by 

v) = D^{v, u) + v) = {p - q,u - v), (2.3) 

where p G dJ{u), q G dJ{v). 

Note that in the differentiable case, i.e. dJ{u) being a singleton, we can omit the special 
subgradient and write Dj{v,u) or Dj ^“^(u,n). 

By the definition of subgradients the nonnegativity is apparent: 

Proposition 2.3. Let J be convex and p G dJ{u). Then 

DPj{v,u)>0 MvGX 

and 

Dj{u, u) = 0. 

If J is strictly convex, then D^{v,u) > 0 for v ^ u. 
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We can further characterize vanishing Bregman distances as sharing a subgradient: 

Proposition 2.4. Let J be convex and p ^ dJ{u). Then D^{v,u) = 0 if and only if p & dJ{v). 

Since Bregman distances are convex with respect to the first argument, we can also com¬ 
pute a subdifferential with respect to that variable, which is simply a shift of the subdifferential 
of J: 

Proposition 2.5. Let J he convex, p G dJ{u). Then 

dvD^{v,u) = dJ{v) — p. 

Concerning existence proofs for variational problems involving Bregman distance it is 
often useful to investigate lower semicontinuity properties. Since Bregman distances can be 
considered as affinely linear perturbations of the functional J it it natural that these properties 
carry over: 

Proposition 2.6. Let J he convex and q G dJ{v). Then the functional H defined by 

H{u) = Dj{u, v) 

is convex. Hence, if X is reflexive, then H is weakly lower semicontinuous. If X is the dual 
of some Banach space Z and J is the convex conjugate of a functional on Z, then q G Z 
implies that H is lower semicontinuous in the weak star topology. 

2.1 Examples of Bregman Distances 

In the following we provide several examples of Bregman distances as frequently found in 
literature as well as some that received recent attention. This shall provide further insights 
into the relation to other distance measures and the basic properties of Bregman distances: 

Example 2.7. Let X be a Hilbert space and J{u) = ^||rt||^. Then dJ{u) = {n} and hence 

D'j{v,u) = ]^\\u-v\\\. (2.4) 

Example 2.8. Let / be a countable index set and X = I^{I) with 

J{u) = \\u\\ii = ^ \Ui\. 

iei 

Then the Bregman distance is given by 

D^j{v,u) = '^{qi-pi)vi= {1 - pi)\vi\ + {I + pi)\vi\. (2.5) 

iGl i,Vi>0 i,Vi<0 

Note that the above sums have nonzero entries only if the sign of Ui does not match the sign 
of Vi, since pi = 1 Ui > 0 and pi = —1 if Ui < 0. 
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Example 2.9. Let X = -^+({1, ... ,N}) with 

N 

J{u) = '^ Ui log Ui + 1- Ui, 
i=l 

which is called the logarithmic entropy (or Boltzmann entropy). Then the Bregman distance 
is given by 

N 

D^j{v,u) = '^Vilog—+ Ui-Vi, (2.6) 

^ Ui 

1=1 

which is known as Kullback-Leibler divergence. An analogous treatment applies to A = 
L^,_(n), for a bounded domain hi, and the continuous version 

J{u)= / (tt(x) log tt(x) + 1 — n(x)) dx, 

Jn 

resulting in the Bregman distance 

D^{v,u) = [ (v{x) log + u{x) — u(x)^ dx. (2.7) 

Jn V u{x) J 

2.2 Bregman Distances and Duality 

Duality is a basic ingredient in convex optimization (cf. [M]) and hence it is also interesting 
to understand some connections of duality and Bregman distances. For this sake we employ 
the convex conjugate (also called Legendre-Fenchel transform) of a functional J given by 
J* : X* {+oo} satisfying 

J*{p) = sup {{p,u) - J{u)). (2.8) 

uex 

Noticing that for p G dJ{u) we have J*{p) = {p,u) — J{u) one can immediately rewrite 
the Bregman distance as 

D^j{v,u) = J{v) + J*{p)-{p,v), (2.9) 

which can be interpreted as measuring the deviation of p from being a subgradient in dJ{v) 
or the deviation of v from being a subgradient in dJ*{p). 

A key identity relates Bregman distances with respect to J to those with respect to the 
convex conjugate J*\ 

Proposition 2.10. Let p G dJ{u) and q G dJ{v). Then 

DP{v,u)=D}.{p,q). (2.10) 

Proof. By simple reordering we find 

D^j{v, u) = J (v) - ip, v) + {p, u)-J(u) 

= J{v) - {p,v) + 

where we have used the maximality relation for the convex conjugate, which is equivalent to 
p G dJ{u). With analogous reasoning we find J*{q) = {q,v) — J{v) and hence 

Dj{v,u) = J{v) + J*{p) - J*{q) - {p - q,v) = Dj4p,q), 

noticing that q G dJ{v) implies v G dJ*{q). □ 
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A second aspect of duality related to Bregman distance is the convex conjugate of the 
latter, which shows that Bregman distances are dual to measuring differences via a functional: 


Proposition 2.11. Let q G dJ{v) and H be defined by 


II 

(D 

(2.11) 

Then 


H*{p) = J*{p + q)-J*{q). 

(2.12) 


Proof. We have 


H*{p) = sup[{p,u)-J{u) + J{v)-{q,v-u)] 

U 

= sup [{p + q,u) - J (n)] - [{q, v) - J (u)] . 

U 

The first term equals J*{p + q) by definition and the second equals J*{q) since q G dJ{v). □ 

2.3 Bregman Distances and Fenchel duality 

In the following we further investigate some properties of Bregman distances for a combination 
of two convex functionals F : A —>■ M U {+oo}, G : Y ^ M U {+oo}. The classical setting is 
related to Fenchel’s duality theorem (cf. [S]); where 

J{u) := F{u)+ G{Ku) (2.13) 

with K : X ^ Y is a bounded linear operator between Banach spaces. The Fenchel duality 
theorem shows that under suitable conditions 

inf J{u) = sup [F*{-K*w) + , (2.14) 

U ^ 

together with equations relating optimal solutions u and w via subdifferentials of the involved 
functionals 

- K*w G dF{u), Ku G dG*{w). (2.15) 

The above duality opens the possibility to employ Bregman distances on the dual problem 
as well as on the primal, which is nicely complemented by the duality relations for Bregman 
distances of a functional and its convex conjugate. 

In the following we derive a basic estimates for the variational problem (I2.13p . which 
clarifies the relation of perturbations of one functional with duality and Bregman distances. 
We shall assume that the regularity of F and G is such that 

dJ{u) = dF{u) + K*dG{Ku) 

and the Fenchel duality theorem holds (cf. [3Tj for details). 

Then we obtain the following estimate for perturbations of J: 

Theorem 2.12. Let F, G and K be as above, and let G be a perturbation of G satisfying the 
same assumptions. Let u X be a minimizer of J with —K*w G dF{u) and u be a minimizer 
ofF{-) + G{K-) with -K*w G dF{u). Then 

u) < G*{w) - G*{w) + G*{w) - G*{w). (2.16) 
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Proof. We have 


J^-K*w-K*w 


{u,u) = {K*w — K*w,u — u) 

= {Ku,w — w) + {Ku,w — w). 


By the Fenchel duality theorem we have Ku G dG*{w) and Ku € dG*{w), which implies the 


assertion by inserting the subgradient inequality. 


□ 


2.4 Bregman Distances for One-homogeneous Functionals 

The case of convex one-homogeneous functionals J, i.e. 


J{tu) = \t\J{u) V t € M 


(2.17) 


received strong attention recently, and also appears to be a particularly interesting one with 
respect to Bregman distances. In the one-homogeneous case one has 


J{u) = {p,u) 


(2.18) 


for p G dJ{u). Thus, the Bregman distance simply reduces to 

Dj{v,u) = J{v)- {p,v). 


(2.19) 


An interesting property in the one-homogeneous case is the fact that the convex conjugate 
is the indicator function of a convex set C, i.e. , 



( 2 . 20 ) 


This sheds interesting light on (12.101) . noticing that p G dJ{u) implies p € G. Hence, 


DPj{v,u) = Df4p,q) = {q-p,v). 


An alternative way to see this property is (12.191) combined with {q,v) = J{v). 

In the one-homogeneous case we immediately hnd an example of Bregman distances van¬ 
ishing for V ^ u. Let t > 0 and v = tu, then dJ{v) = dJ{u) implies D^{v,u) = 0. On 
the other hand we observe that the Bregman distance distinguishes different orientation. 
Choosing u = tu for t < 0 we have dJ{v) = —dJ{u), hence D^{v,u) = 2J{y). 

3 Applications in Inverse Problems and Imaging 

In the last decade, Bregman distances have become an important tool in inverse problems 
and image processing. Their main use is twofold: On the one hand they are of particular 
importance for all kinds of error estimates as already sketched above and in particular they 
are quite useful for the analysis of variational regularization techniques with nondifferentiable 
regularization functionals. This route has been initiated in [22] and subsequently expanded 
e.g. in P [23l |36l [38l ISU SOI [Ml ESI ES]- On the other hand Bregman distances can be 
used to construct novel iterative techniques with superior properties compared to classical 
variational regularization. This route goes back to [52] and was developed further e.g. in 
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[SollSIllMlETlIlHlISSllMlES], the methods also had a huge impact on various applications 
(cf. e.g. [sniEiiEn])- 

The basic setup we consider is the solution of a problem of the form Ku = f, where 
K : X ^ Y is a bounded linear operator between Banach spaces and / are given data. Since 
in most cases K does not have a closed range (or is even a compact operator) and data contain 
measurement errors, this problem can be ill-posed. To cure this issue variational regularization 
methods employ a convex regularization functional R : X —)■ M U {-|-oo}, which introduces 
the a-priori knowledge that reasonable approximations of the solution u have small (minimal) 
values R{u). Variational regularization methods make a compromise between approximating 
the data / and minimizing R and solve a problem of the form 

D{Ku, f) + aR{u) ^ Taia, (3.1) 

u£X 

where D : Y x Y —)• M is an appropriate distance measure and a > 0 is a regularization 
parameter to be chosen appropriately in dependence of the measurement error (often refered 
to as data noise). Specific forms of the distance measure are derived e.g. via statistical 
modelling as the negative log-likelihood of the data noise. Frequently D is simply a least- 
squares term, i.e. T is a Hilbert space and 

D{Kn,f) = l\\Ku-ffy. (3.2) 

A classical example is the ROF-model for image denoising m, where R is the total variation 
seminorm, K is an embedding from BV(H) nL^(n) into L^(H), and D the squared L^-norm. 
For the whole section we shall assume that D is convex with respect to the first variable, 
which is the case for almost all commonly used examples. 

3.1 Error Estimates 

Error estimates for solutions of ()3.ip are of interest with respect to two quantities: First of 
all, the distance of the data / to the ideal data Ku*, where u* is the unknown ideal solution. 
This part is refered to as data error or noise. Secondly, the regularization parameter a, which 
should be equal zero in the case of ideal data and introduces a systematic error in the case of 
perturbed data (when it needs to be positive). In the setting of (|2.13l) we thus need to choose 

F{u) = aR{u), G{Ku) = D{Ku, /). (3.3) 

The optimality conditions for a minimizer Ua are then of the form 

Pa = K*Wa, Pa S dR{ua) — aK*Wa G dD{Kua, f), (3.4) 

where the subgradient of D is meant to be computed with respect to the first argument for 
fixed /. 

In order to obtain error estimates for some different data / we choose G{Ku) = D{Ku, f) 
and denote by Ua its corresponding regularized solution with 

Pa = K*Wa, Pa G dR{Ua)- 

Then (j2.16D yields 

< G*{Wa)-G*{Wa)+G*{Wa)-G*{Wa). (3.5) 
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To further illustrate the behaviour consider the case of a quadratic data hdelity 


G{Ku) = D{Ku, f) = ^\\Ku - /f, (3.6) 

for some squared Hilbert space norm, which yields G*{w) = ^||rc|p + {w,f). Hence, 

aD^ < if - f,Wa- Wa). (3.7) 

In the case (j3.6|) one can see quite immediately why the (symmetric) Bregman distance is 
an appropriate error measure for the estimates. Starting with the optimality conditions 

KUa — f + OiWa = 0, Pa = K*Wa G R{Ua), 

KUa — / + OlWa = 0, Pa = K*Wa G R{Ua), 


we find 

K{Ua-Ua)+a{Wa-Wa) = f - f*. (3.8) 

The right-hand side is exactly the perturbation of the data, whose norm we want to use to 
estimate errors in the solution Ua- Hence we simply take the squared norm on both sides and 
obtain by expanding on the left-hand side 

\\K{Ua - Ua)f + 2a{Wa “ Wa,K{Ua “ Ua)) + a^\\Wa “ WaW^ = ||/ “ f\?- 
Finally using K*Wa = Pa we arrive at 

\\K{Ua - Ua)\\^ + 2aD^£’^‘^{Ua,Ua) + a^\\Wa “ Waf = \\f “ /f, (3.9) 

which implies (by the nonnegativity of all involved terms) the immediate estimate 

D^R’^°‘iUa,Ua) < ^\\f - ff (3.10) 

for the Bregman distance. Note that (13.91) is not just an estimate, but indeed an equality for 
three error terms - the error in the image of the operator K (somehow the residual), the error 
in the dual variables w, and the Bregman distance of solutions. Here Ku and w are elements 
of a Hilbert space and it is of course natural to measure their deviations in the corresponding 
norm, so (13.9p yields the Bregman distance as the naturally induced error measure in the 
Banach space X. 

Having obtained (13.9p it is interesting to note that one can alternatively obtain estimates 
for two parts of the right-hand side by taking scalar products of (13.8p with appropriate 
elements and subsequent application of the Cauchy-Schwarz respectively Young’s inequality. 
The hrst is obtained by a scalar product with Kua — Ku*, which yields 

\\K{Ua - Ua)f + OlD^£'^°‘{Ua,Ua) = (/ “ f,K{Ua - Ua)) < ^\\f - ff + ^\\K{Ua - Ua)f, 
hence 

\\K{Ua - Ua)\\^ + 2aD^£'^°‘{Ua,Ua) < \\f “ If- (3.11) 

Using analogous reasoning, a scalar product of (j3.8p with Wa — Wa leads to 

2aD^£’^°‘{Ua,Ua) + O^WWa - Waf < \\f - ff- 
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(3.12) 


3.2 Asymptotics 

A key question in inverse problems is the behaviour of the regularized solution Ua as a —>■ 0, 
which only makes sense if the noise in the data vanishes, i.e. / = Ku* for some desired solution 
u* € X. It is well-known that for ill-posed problems the convergen.ee can be arbitrarily slow 
as a —>■ without further conditions on the desired solution u*. For a further characterization it 
is important to note that under appropriate choice of a a limiting solution u* of the variational 
model (|3.ip satisfies 

R(u) —>■ min subject to Ku = Ku*. (3.13) 

uGX 

This can be seen from the estimate 


D{Kua, f) + aR{ua) < D{Ku*, f) + aR{u*). 


Using a ^ 0 and D{Ku*, /) —)> 0 we see that D{Kua, /) 0, hence the limit is a solution of 

Ku* = f. Dividing by a and using nonnegativity of D, we find 


R{Ua) < R{u*) -|- 


D{Ku*J) 

a 


and under the standard condition on the parameter choice 


a 

we observe that the limit of Ua cannot have a larger value of R than any other solution of 
Ku = /, i.e. it solves (I3.13p . 

The key observation in [221129| is that appropriate conditions in the case of variational 
regularization is related to the existence of a Lagrange multiplier for (13.131) . The Lagrange 
functional is given by L{u,w) = R{u) — {w,Ku — Ku*), hence the existence of a Lagrange 
multiplier is the so-called source condition 


p* = K*w* € dR{u*). (3.14) 

Let us again detail the arguments in the case (13.bp . where we can indeed use the above error 
estimates like (EJl) with Ua = u* and Wa = w*. In order to obtain Ua as the solution of a 
variational problem we can indeed choose f = Ku* + aw* (note that (I3.14p is equivalent to 
the existence of some / such that u* solves the variational problem with data /, cf. [22]). 
Hence, (13.91) becomes 

\\K{ua - U*)f + 2aDP^’P\ua,u*) + a'^Wwa - W*f = 11/ - Ku* - aw*f. (3.15) 
Again with Young’s inequality we end up at 

{ua,u*) < — -—halite*Ip, (3.16) 

a 

which gives the usual optimal choice a ~ ||/ — iLti*|| of regularization parameter in terms of 
the noise level, exactly as in the linear Hilbert space case (cf. |35 | ). 
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3.3 Bregman Iterations and Inverse Scale Space Methods 

A frequent observation made for variational methods as discussed above is a systematic bias, 
in particular the methods yield solutions Ua with R{ua) too small, which e.g. results into a 
local loss of contrast in the case of total variation regularization (the constrast loss is larger 
for smaller structures). In order to cure such systematic errors in particular in the case of 
one-homogeneous regularization it turned out that the well-known Bregman iteration is a 
perfect tool. Instead of solving the variational problem only once one usually starts at uq 
being a minimizer of the regularization functional R, i.e. at the coarsest scale (if one agrees 
that scale is defined by R). Then of course po = 0 £ dR{uo) and one can subsequently iterate 

Ufc+i £ aTgmm{D{Ku, f) + aD^j^{u,Uk)) , (3.17) 

where the subgradient pk is updated via the optimality condition 

Pk+i -Pk& --K*dD{Kuk, /). (3.18) 

a 

Noticing that we can again write pk = K*Wk one can also construct an iteration 

Wk+i -Wk £ --dD{Kuk, /), (3.19) 

a 

from which one can derive the well-kown equivalence to augmented Lagrangian methods for 
minimizing R subject to Ku = f. 

The convergence analysis in the case f = Ku* follows the well-known route for the Breg¬ 
man iteration, but due to the ill-posedness of Ku = f there is a particularly interesting aspect 
in the case of noisy data / differing from the ideal Ku*. If the range of K is not closed, one 
has to take care of the situation where neither a solution Ku = f nor some kind of least 
squares solution (a minimizer of D{Ku, f)) exists in X. Hence, the Bregman iteration has 
the role of an iterative regularization method and needs to be stopped appropriately before 
the noise effects start to deteriorate the quality of the solution. Indeed one can show that 
the Bregman distance DP'^{u*,Uk) is decreasing during the first iterations up to a certain 
point when the residual D{Ku^, f) becomes too small (i.e. one approximates the noisy data 
stronger than Ku*). Successful stopping criteria as the discrepancy principle are indeed based 
on comparing the residual with an estimate of the noise D{Ku*, f) and stop when D{u^,f) 
drops below this estimate. 

In imaging a particularly interesting and quite related aspect of Bregman iterations is 
the scale behaviour. As mentioned above, with scale defined as above by properties of the 
regularization functional R, the Bregman iteration inserts finer and finer scales during its 
progress. In order not to miss certain scales it is obviously interesting to make small enough 
steps, which amounts to choosing a sufficiently large. For the limit of a —>■ oo one can 
interpret the iteration as a backward Euler discretization (with timestep ^) of a flow, which 
has been called inverse scale space method by a reminiscence to so-called scale space methods 
in image processing, which exhibit the opposite scale behaviour (cf. [611159j l. The inverse 
scale space flow is a solution of the differential inclusion 

dtp{t) £ -K*dD{Ku{t), f), p{t) £ dJ{u{t)), (3.20) 

with initial value n(0) = uq such that p(0) = 0 G dR{uo). It can be interpreted a gradient 
flow for the subgradient p on a dual functional (cf. [16]) or as a doubly nonlinear evolution 
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equation. For the latter we will give an explanation on the analysis in terms of Bregman 
distances related to the involved functionals in the next section, which is also the appropriate 
way to analyze the inverse scale space method. 

An unexpected result is the behaviour of the inverse scale space flow for polyhedral 
functions such as the .^^-norm. Roughly speaking the polyhedral case means that for any 
u £ X a subdifferential dR{u) can be obtained via convex combinations of a finite number 
of elements (independent of u). It has been shown (cf. [151 150) 1 that in such cases and 
D{Ku, f) = \ \\Ku — /IP the dynamics of the solution u{t) is piecewise constant in time, i.e. 
quite far from a continuous flow, while the dynamics of the subgradients p{t) is piecewise 
linear in time. Interestingly, the time steps at which the solution changes can be computed 
explicitely, and the value of u{tk) is obtained by minimizing 

\\Ku — /Ip subject to p(tfc) G dR{u). 

This is particularly attractive in the case of sparse optimization with R being the .^^-norm, 
since the condition p{tk) £ dR{u) defines the sign of u and in particular the set of zeros. This 
means that the the least-squares problems have to be solved on a rather small support, which 
is highly attractive for computational purposes (cf. [15] 1. Let us briefly explain the behaviour 
for R : ]£■*■ being the £^-norm and some arbitrary differentiable functional G on the 

right-hand side, i.e., 

dtPi{t) = -duiG{u{t)). (3.21) 

In this case the subdifferential is the multivalued sign of Ui{t)) and for uq = Po = 0 we 
obviously find Ui{t) = 0 for sufficiently small time since |pi(t)| < 1, which holds for all i. 
Hence for t <ti with ti to be determined we find 


dtPi{t) = -duiG{0), 


(3.22) 


which can be integrated easily to 


Piih) = -tiduiG{0). 


(3.23) 


The key observation is that Ui ^ 0 for some i is only possible if |pi(ti)| = 1. This implies that 
the first time with possibly nonzero u is 


h = 


1 


l|9G(0)||oo' 


(3.24) 


At time ti the sign of all Ui is determined by Pi{ti) and one can check that a solution is 
obtained by minimizing 


u(ti) £ arg min G{u) subject to PiUi) £ d\ui(ti)\, 


(3.25) 


or in other words 

u{ti) £ arg min G{u) subject to Pi{ti)ui{ti) > \ui{ti)\ Vi. 

The optimality condition for the latter problem can be written as 

duiG{u{ti)) + Xi{qi - pi{ti)) = 0, qi £ d\ui{ti)\. (3.26) 
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for some A E satisfying the complementarity conditions 

Aj > 0, \i{pi{ti)ui{ti) - \ui{ti)\) = 0. 

This implies duiG{u{ti)) = 0 of Ui{ti) ^ 0, duiG{u{ti)) > 0 if Ui{ti) = 0 and Pi(ti) = 1, and 
duiG{u{ti)) < 0 if Ui{ti) = 0 and Pi{ti) = —1. This implies that we can find a time interval 
such that 

u{t) = u{ti), p{t) = p{ti) - {t- ti)dG{u{ti)) 

is a solution, and t 2 is again defined as the minimal time where there exists i such that 
bi(^ 2 )| = 1 and \pi{t)\ < 1. Again, the solution at time t 2 is defined by a solution of the 
variational problem 

u(t 2 ) E arg min G{u) subject to Pi(t 2 )uiit 2 ) > \ui{t 2 )\ Vh 

By an inductive procedure one obtains that the same kind of dynamics goes on for all t until 
it stops after finite time steps tn at a minimizer of G. 

As mentioned above the scale behaviour of the inverse scale space flow is highly attractive 
in image processing. In the polyhedral case there is a somehow exact decomposition into 
different scales by the steps made at times tk- Indeed dtu is a sum of concentrated measures 
in time, and one may eliminate certain scales by leaving out the corresponding jump u{tk + 
r) — u{tk — t). This observation leads the way to a much a more general definition of filters 
from the inverse scale space method, which was discussed in m 

dtp{t) = f - u{t), p{t) E dR{u{t)). (3.27) 

A certain scale filter is defined by 

roo 

F{f) = uo+ w{t)ddtu{t), (3.28) 

Jo 

with measureable weights w{t) E [0,1]. In the case w = 1 one simply obtains /, while certain 
scales can be damped out choosing w{t) = 0 for t in an appropriate interval. The design of 
filters for certain purpose is an ongoing subject of research. 

4 Applications in Partial Differential Equations 

In the following we provide an overview of different aspects of partial differential equations, 
where Bregman distances are a useful tool. Unlike the case of inverse problems and image 
processing discussed above the notion of Bregman distance is not used widely in this field, 
and indeed most applications do not refer to this term or use it in a very hidden way. Our 
goal in the following section is to work out the basic ideas related to Bregman distances 
in a structured way, which sheds new light on many established techniques and hopefully 
also opens routes towards novel results. For this sake we employ a formal approach and 
avoid technicalities such as detailed function spaces, which of course can be worked out from 
existing literature. 
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4.1 Entropy Dissipation Methods for Gradient Systems 

Entropy dissipation methods are a frequently used tool in partial differential equations (cf. 
[1113]), which is often based on using the logarithmic entropy 

E{u) = / ri(x) log ri(x) dx (4.1) 

Jn 

as a Lyapunov functional, e.g. in diffusion equations (cf. e.g. [21151151 IBl L kinetic equations 
(cf. e.g. [3]), or fluid mechanics (cf. e.g. [58|). In particular in gradient systems also different 
convex functionals are used regularly and in a structured way. The abstract form of a gradient 
system is 

dtu{t) = -L{u{t))E'{u{t)), (4.2) 

where L{u) is a linear symmetric positive semi-definite operator on appropriate spaces and E 
a convex energy functional, which we assume differentiable for simplicity (similar treatment 
for non-differentiable convex functionals is possible by using subgradients, but beyond our 
scope). The entropy dissipation property can be verified by the straight-forward computation 

^E{u{t)) = E'{u{t))dtu{t) = -{E'{u{t)),L{u{t))E'{u{t))) < 0. (4.3) 

The negative of the right-hand side is frequently called entropy dissipation functional D{u{t)) 
and can be used to derive further quantitative information about the decay to equilibrium. 
A standard example (cf. H [26] ) are nonlinear Fokker-Planck equations of the form 

dtu = V ■ (m(rt)V(e'(w) + V)) (4.4) 

on a domain Q C with no-flux boundary conditions. Here, e : M'*' —>■ M is a convex function, 
m : ^ M"*" a (potentially nonlinear) mobility function, and V : H M an external 

potential. Recently also systems of Fokker-Planck equations as well as certain reaction- 
diffusion systems of the form 

dtUi = DiAui + Ei{ui,...,UM), i = (4.5) 

have been investigated with entropy dissipation techniques (cf. [441 ISTl I47] i. 

The major purpose of entropy dissipation techniques is to obtain qualitative or ideally 
quantitative results about the decay to equilibrium for transient solutions. An equilibrium 
solution Uoo is a minimizer of E on a convex set K, to which also the transient solution u{t) 
belongs for all t. An example is the Fokker-Planck equation with linear mobility m{u) = n, 
where K is the set of nonnegative integrable functions with prescribed mean vaule. Hence, 
Uoo satisfies 

E'{uoo){u — Uoo) >0 u € K. (4.6) 

If further the operator L{u) is such that 

L{u)E'{uoo) = 0 '^ueK, (4.7) 

which is indeed the case for the typical examples, then one can rewrite the gradient system 
as 

dtu{t) = -L{u{t)){E'{u{t)) - E'{uoo))- (4.8) 
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Hence, the right-hand side is expressed in a difference of energy gradients for the transient 
and equilibrium solution. In a similar way, the entropy dissipation can be rewritten in terms 
of a distance between those and the Bregman distance (usually called relative entropy) plays 
a key role for this purpose. One observes that 

= E'{u{t))dtu{t) - E'{uoo)dtu{t) 

= -{E'{u{t)) - E'{uoc),L{u{t)){E'{u{t)) - -E'iuoo))) 

= : -F{u{t),Uoo)- 

Of course, the above computation holds for smooth solutions only, for weak solutions on can 
usually derive the time-integrated version 

+ y F(u(r) (4.9) 

The above computation shows that entropy dissipation can be rephrased as the decrease 
of the Bregman distance between stationary and transient solution. We notice that the use 
of the Bregman distance is not essential in this computation, but the understanding of this 
structure can be quite benefitial, in particular if one wants to use dual variables, the so-called 
entropy variables 

(^(t) = E\u{t)), ipoo = E'{uoo)- (4.10) 

The entropy variable (p solves the system 

dt{E'ip>it)) = -L{{E*y{p{t))Mt), (4.11) 

where E* is the convex conjugate of E. When analyzing the dual flow (|4.1ip a dissipation 
property can now be derived immediately using relation (I2.10p . Thus, we obtain a dual 
entropy dissipation of the form 

^^B*^(</^oo,</?(t)) = - - {p{t) - Poo, L{{E*y{p{t))){ip{t) - Poo))- (4.12) 

The duality relation is particularly interesting for constructing approximations in terms of the 
entropy variables, as e.g. carried out for degenerate cross-diffusion systems in (cf. [m[32l[M| l. 
In order to obtain quantitative estimates for the decay one needs 

4.2 Lyapunov Functionals for Gradient Systems out of Equilibrium 

The appropriate use of Bregman distances seems to be less explored, but maybe even more 
crucial for the derivation of Lyapunov functionals if gradient systems are perturbed out of 
equilibrium. The simplest example is the linear Fokker-Planck equation with non-potential 
force as investigated in [2] 


dtu = V ■ (Vu -|- uE) in n X M’*', 
supplemented by no-flux boundary conditions 

(Vu -|- uE) • n = 0 on dVL x R^. 


(4.13) 


(4.14) 


14 






If the vector field F is not the gradient of some potential function, then a stationary solution 
cannot be constructed as the minimizer of an entropy functional. However, the existence and 
uniqueness of a stationary solution can be shown under quite general assumptions on F (cf. 
|33ji. In a form similar to gradient flows we write (j4.13p as 

dtu = V ■ {u{Ve'{u) + F)), e{u) = ulogu + 1 — u, (4-15) 

which suggests to further investigate distances based on the entropy functional 


E{u) 


/ e{u) dx = / {ulogu — u + 1) dx. 

Jo. Ju 


The dissipation of the relative entropy can be computed via 


(4.16) 


dt ^ 


iu{t),Uc 


[ {e'{u{t)) - e'{uoo))dtu{t) dx 
Jn 

[ - e'(ttoo))V • u(V(e'(u(t)) - e'(ttoo)) + Ve'(uoo) + F) dx 

Jn 

- [ tt|V(e'(u(t)) -e'(ttoo))P dx 
Jn 

+ [ (e'(n(t)) - e'(ttoo))V • u{t){Ve'{uoo) + F) dx, 


Jn 

where we have used the no-flux boundary conditions 

(Ve(u(t)) + F) ■ n = (Ve(rtoo) + F) ■ n = ^ on d^l x 

in order to apply integration by parts in the first term on the right-hand side. The second 
term is simplified via 


V-(u(Ve'(uoo) + T)) = 


With T satisfying ^'(z) = 


V-(^noo(Ve'(r(oo) + T)) 

'^OO 

«ooV(^)-(Ve'(uoo) + T)) 

ttooV exp(e'(n(t)) - e'{uoc)) ■ {Ve'{uoo) + F)) 

Uooexp(e'(u(t)) - e'(uoo))V(e'(u(t)) - e'(uoo)) • (Ve'(ttoo) + F)). 

zexp{z) we can further write 



e'(tioo))V ■ u{t){Ve'{uoo) + F) dx = 


VT(e'(u(t)) 


in 


e'(ttoo)) ■ Uoo(Ve'(ttoo) + F) dx = 0, 


which can be seen again through integration by parts. Hence, we finally obtain the decrease 
of the Bregman distance via 


dt ^ 


{u{t),Uoc,) 


u\V{e'{u{t)) - e'(rtoo))P dx, 


(4.17) 


and the logarithmic Sobolev inequality (cf. [3]) implies exponential convergence to equilib¬ 
rium. 
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Another example are boundary-driven nonlinear Fokker-Planck equation 


dtu = V • {Vm{u) + m{u)F) in 17 x R’*', 
considered in [TB] with Dirichlet boundary conditions 

u = g on dQ x R"''. 


(4.18) 


(4.19) 


We mention that an analogous analysis holds in the case of no-flnx boundary conditions 
(in which case we have a direct generalization of the nonsymmetric Fokker-Planck equation 
above) or mixed Dirichlet and no-flux boundary conditions. Bodineau et al |13] construct 
Lyapunov functionals of the form 


H{u,Uoo 



m{s) 

m{uoo{x)) 


ds dx, 


(4.20) 


where is a nonnegative function with unique minimum at zero. Such a construction seems 
far from being intuitive, but it becomes much more clear for being the logarithmic entropy, 
i.e. ^'(t) = logt. In this case the Lyapunov functional becomes 


H (w, Woo) — 


If 


•u(x^t) 


Uaa(x) 


logm(s) — log m(woo(a^)) ds dx, 


(4.21) 


and with a function e such that e'{s) = logm(s) we further obtain 


H{u,Uoo)= / {e{u{x,t)) - e{uoo{x)) - e'{uoo{x)){u{x,t) - Uoo{x))) ds dx, 

Jn 

which is nothing but the Bregman distance for the entropy functional 

E{u) = / e(u) dx, with e'(u) = logm(w). 

Jn 

Since equation (14.1811 can be written as 

dtu = V ■ (m(n)(Vlogm(n) + F)), in 17 x R’*', 


(4.22) 


(4.23) 


(4.24) 


the above form of E is also a natural choice. The detailed computations for the entropy 
dissipation are indeed completely analogous to the case of the linear Fokker-Planck equation, 
the crucial point appears to be the logarithmic relation between entropy derivatives e' {u) and 
mobilities m{u). 


4.3 Doubly Nonlinear Evolution Equations 

A generalization of gradient systems are donbly nonlinear evolution equations with a gradient 
structure either of the form 

dtp{t) € —dG{u{t)), p{t) G dF{u{t)) (4-25) 

or as 

dF{dtu) + dG{u{t)) B 0. (4-26) 
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The best studied case, which is also the one where both coincide, corresponds to F{u) = ^ ||tt|p 
for a norm in a Hilbert space, which yields the classical gradient flow 

dtu{t) € —dG{u{t)). (4.27) 

We have seen a system in the form (j4.25p already above in the inverse scale space method, 
while the form (j4.26p appears frequently in mechanical problems (cf. e.g. [H] and references 
therein) There is indeed a duality relation for (j4.25h and (j4.26p . Starting from (I4.25h we 
obtain u{t) € dG*{—dtp{t)) n dF*{p{t)), respectively —u{t) G dG*{dtp{t)) if G satisfies a 
symmetry condition around zero. This yields 

dG*{dtp{t)) + dF*{p{t))^0, 


the analogue of (14.261) . 

Doubly nonlinear evolution equations have recently been investigated extensively, and 
in particular tools from convex analysis have been employed (cf. [48] 1. Here we add our 
Bregman distance point of view to derive estimates for such equations. Let us start with a 
straightforward computation on the change of the time derivative of the Bregman distance: 

Lemma 4.1. Let F be differentiable and u a solution of (I4.25p . Then 

^D^/\v,u{t)) = -{dtpit),v - u{t)) < G{v) -G{u{t)). 

This can be used to quantify the distance of u{t) to a minimizer of G: 

Corollary 4.2. Let F be differentiable, Uoo a minimizer of G, and u a solution of (14.2511 . 
Then 

^D§*'\uoo,u{t)) + D^{u{t),Uoo) < 0. (4.28) 

Since it is straightforward to see 

fD°a(u(t),u^) = LG(u(t))< (4.29) 

we see after integrating (I4.28P in time 


D^J:^\uoo,u{t)) +tL»^(n(t),Uoo) < D'^^^\uoo,u{t)) + 
leading to linear decay of the Bregman distance: 



D%{u{s),Uoo) ds < D^J:°\uoc,u{0)), 

(4.30) 


Theorem 4.3. Let F be differentiable, Uoo a minimizer of G, and u a solution of (I4.25p . 
Then 


D%{u{t),Uoo) < 


(4.31) 
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4.4 Error Estimates for Nonlinear Elliptic Problems 

We finally turn our attention to the analysis of discretization methods for nonlinear elliptic 
problems such as the p-Laplace equation. Such elliptic problems are optimality conditions of 
some energy functional of the form 


E{u) = J{u)-{f,u), (4.32) 

where J is a convex functional on a Banach space X, typically a Sobolev space of first order 
derivatives. The elliptic differential equation (or more general differential inclusion) is the 
optimality condition 

p = f, p&dJ{u) (4.33) 

A canonical example is the p-Laplace equation 

- V-(|Vu|P"2vn) =/, (4.34) 

which is related to the functional 

J(u) = - [ |Vu(x)|P dx. (4.35) 

pJn 

For variational discretizations of such problems the Bregman distance appears to be a quite 
useful tool, which is still not fully exploited. In many approaches the Bregman distance is used 
in a hidden way and strict convexity is used to obtain an estimate in terms of the underlying 
norms (with potentially suboptimal constants however). For the p-Laplace equation such an 
approach is carried out in [32]. Again in the limiting case p = 1 related to total variation 
minimization the Bregman distance is even more crucial and appears e.g. in [7|. Here we 
briefly sketch the obvious role of Bregman distances in Galerkin discretizations of the form 

E{u) —> min , (4.36) 

u&Xfi 

where is a finite-dimensional subspace of X, e.g. constructed by finite elements. 

Let us start by pointing out the basic structure of error estimates for Galerkin methods 
in the linear, case related to the minimization of a positive definite quadratic form 

J{u) = B{u,u), (4.37) 

where B:AxA—^Misa bounded and coercive bilinear form. The optimality condition in 
weak form is given by 

B{u,v) = {f,v) Vue A, (4.38) 

and the Galerkin discretization yields a solution Uh E Xh of 

B{uh,v) = {f,v) Vue A;,. (4.39) 

Error estimates for such discretizations are obtained in two steps: first the error between u 
and Uh is estimated by the projection error to the subspace Xh and then the projection error 
is estimated, e.g. via the interpolation error. The crucial property for the first step is the 
so-called Galerkin orthogonality 

B{u-Uh,v) = 0 y V G Xh, (4.40) 
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which implies 

B{u-Uh,u-Uh) = B{u-Uh,u-v) \/V G Xh, (4.41) 

and by the Cauchy-Schwarz inequality for the positive definite bilinear form B 

B{u — Uh, u — Uh) < B{u — v,u — v) y V G Xh- (4.42) 

In other words Uh is the projection of u on the subspace Xh, when the (squared) norm induced 
by B is used as a distance measure. 

Since the term B{u — v,u — v) above is just the Bregman distance related to quadratic 
functional J one might think of an analogous property in the case of nonquadratic J, when 
the Bregman projection is used. Indeed, we can derive such a relation in the case of arbitrary 
convex J. For this sake let again tt be a minimizer of E and Uh a minimizer of E constrained 
to the subspace Xh- Then we have / G dJ{u) and thus, since Uh minimizes E on Xh, we have 
for all V G Xh 

Dj{uh,u) = J{uh) - J{u) - {f,Uh-u) 

= E{uh) - J{u) + {f,u) 

< E{v) - J{u) + {f,u). 

Rewriting the last term we hence obtain the Bregman projection property 

D^j{uh,u) < D^j{v,u), y V G Xh- (4.43) 

This observation opens a way to analyze Galerkin methods for such nonlinear problems in 
the same way as in the linear case, the key step to be developed for specific problems and 
specific discretizations (Xh) is the estimation of the Bregman projection error. 

Note again the role of the Bregman distance for error estimation: The one-sided dis¬ 
tance Dj{uh,u) is particularly suitable for the estimation of a-priori errors as above, while 
a-posteriori error estimation should rather be based on the distance D^J^{u,Uh) with ph G 
dJ{uh)- We have by the minimizing property of u 

Df{u,Uh) = J{u) - J{uh) - {ph,u-Uh) 

= E{u) - E{uh) + {ph - f,Uh- u) 

< {Ph- f,Uh- u). 

Using the duality relation u G dJ*{f), this could be further estimated to the full a-posteriori 
estimate 

D^/{u,Uh) < {ph - f,Uh) + J*{2f -ph) - J*{f). (4.44) 

For practical purposes the above abstract estimate is not useful in most cases, since computing 
the adjoint J* means to solve a nonlinear partial differential equation as well, which might 
be as difficult as the original one. However, the general strategy can be exploited together 
with specific properties of the functional J and the subspace Xh- In particular for gradient 
energies of the form 

J{u) = / j{Xu) dx (4.45) 

Jn 

one can derive alternative versions using only the convex conjugate j*, which is significantly 
easier to compute. 
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5 Further Developments 


In this final section we discuss some aspects of Bregman distances that came up recently and 
will potentially have strong further impact, in particular we will explore some developments 
related to probability. 

5.1 Uncertainty Quantification in Inverse Problems 

Since Bregman distances appear to be a suitable tool for estimates in certain nonlinear de¬ 
terministic problems, it seems natural to exploit them also in the stochastic counterparts of 
such problems. The obvious measure for error estimates is then the expected value of the 
Bregman distance with respect to the stochastic quantity. Such approaches have been used 
successfully in particular in statistical inverse problems (cf. e.g. [HU]), which we also want 
to discuss in the following. In order to avoid technicalities we restrict ourselves to a purely 
finite-dimensional setup. 

Consider the inverse problem Ku = f, where K : —>■ R^ and the data are generated 

from a true solution u* with additive Gaussian noise, i.e. 

/ = Ku* + an, (5.1) 

with n a Gaussian random variable with zero mean and covariance matrix Im- Let again R 
be a convex regularization functional and Ua a solution of the variational problem 

Jiu) = i^\\{Ku-f)f+ aJ{u) ^ min. (5.2) 

2a^ 

Then Ua satisfies the optimality condition 

-^K*K(ua — u*) + apa = -^K*n, pa G dRiua), (5.3) 

a^ a^ 

which implies pa = K*Wa- Now assume u* satisfies the source condition (I3.14p then we have 

K{ua — u*) -t- aa‘^{wa — w*) = n — aa^w*. 

Taking the squared norm and subsequently expection with respect to w in this identity we 
obtain 

2aa^E[DP£’P\ua,u*)] < E[\\K{ua - u*)f + 2aa^DP£P\ua,u*) + a^a^\\wa - w*\\]‘^ 

= E[\\n — aa^w*\f] 

= E[\\nf] + a‘^a^\\w*f = a^M + a‘^a^\\w*f 
Thus, the expected error in the Bregman distance is estimated by 

EK-’’" (««.“•)] < ^ + (5.4) 

We notice that the above approach not only yields an estimate of the Bregman distance, 
but indeed an exact value for the sum of three error measures, in addition to the Bregman 
distance also the residual error as well as the error in the source space (related to Wa — w*). 
Usually the latter is the largest of the three, so one needs to expect a blow up of this term as 
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M ^ oo if a is not increasing as M. If one is interested in the first two terms only, one can 
simply use a duality product with Ua — u* in ( 15 . 31 ) and subsequently estimate the expected 
value of the right-hand side in a different way, which may lead to robust estimates in terms 
of M respectively estimates that can be carried out for infinite-dimensional white noise. 

An application of Bregman distances in Bayesian modelling was recently investigated in 
[18j . considering frequently used posterior densities of the form 

7 r(M|/) ~ e ^ (5-5) 


where again R is a convex and Lipschitz continuous functional on (generalizations to 
posterior distributions in infinite-dimensional spaces where further studied in [H]). It has 
been shown that the posterior can be centered around the so-called maximum a-posteriori 
probability (MAP) estimate u, which maximizes p{u\f), in the form 


7r(u|/) 


- —aD^ {u,u) 


(5.6) 


Based on the observation 


{s, u — u) 


\\Ku — ATulP 


-|- a{p — p,u — u). 


for p € dR{u) and 

s = ■^K*{Ku- f ) + ap€ 5 (-log 7 r(u|/)), 

( 7 ^ 

a Bayes cost of the form 


r(u) 


\\Ku — KvW^ 


+ a{q — p,v — u) 


(5.7) 


(5.8) 


has been introduced for q € dR{v) (note that selection of p € R{u) is only needed on a 
set of zero measure due to Rademacher’s theorem). A simple integration by parts argument 
then shows that the MAP-estimate u is a minimizer of the Bayes cost, which is a quite 
natural choice compared to the highly degenerate cost usually used to characterize MAP 
estimates (cf. [15]). A direct consequence is the fact that the MAP estimate has smaller 
Bregman distance in expectation than the frequently used conditional mean estimate, hence 
one obtains a theoretical argument explaining the success of MAP estimates in practice. 


5.2 Bregman Distances and Optimal Transport 

Bregman distances can be used also as a cost in optimal transport, which has been investigated 
in | 25 | for a convex and differentiable functional J on M'^. Given two probability measures p 
and u, an optimal transport plan is a probability measure 7 on x R-^ with marginals p 
and v minimizing the functional 

-^( 7 ) = [ Dj^'^\v,u) d'y{v,u). ( 5 . 9 ) 

Jm.^ xR^ 

The resulting optimal value of F can be interpreted as a transport distance between the 
measures p and u. 


21 








Besides the important question of well-posedness solved in (cf. [25]) there are several 
interesting problems such as the existence of transport maps under certain condition (i.e. 
concentration of 7 on a set described by the graph of a map T : M.^) as well as 

relations to uncertainty quantification. A first example is the Bayes cost approach described 
in the previous section, which can indeed be interpreted as the transport distance between 
the posterior distribution and a measure concentrated at the MAP estimate. This motivates 
further research in the future, an obvious next step might be to estimate distances between 
different posterior distributions in transport distances related to Bregman distances. 

A different use of Bregman distances in optimal transport was recently made in [ 8 ] for 
the solution of Monge-Kantorovich formulations in optimal transport. They consider entropic 
regularizations of the problem, i.e. for e > 0 they minimize a discrete version of 

F'ei'l) = [ C{v,u) d-f{v,u) + eE{j), (5.10) 

jR^xR^ 

where E is the entropy 

^( 7 )= / d'y{v,u), (5.11) 

Jr^xR^ \dLJ 

where ^ is the Radon-Nikodym derivative with respect to the Lebesgue measure. The key 
observation is that the minimization of E^ can be rewritten equivalently as the minimization 
of the Kullback-Leibler divergence, i.e. the Bregman distance related to E, between 7 and 
the Gibbs measure with density 


T>E( 7 ,<y 9 ,) min, (5.12) 

7 

which transforms the problem into a Bregman projection problem of the Gibbs density onto 
the set of plans with given marginals, which can be computed much more efficiently than the 
original transport control problem. Note that the general procedure can be carried out as 
well with an arbitrary convex functional whose domain are positive densities, the correspond¬ 
ing Gibbs density is then to be defined as = (E*)'{—C/e). A particular computational 
advantage of the logarithmic entropy is the fact that iterative Bregman projections can be 
computed explicitely and realized with low complexity, in the discrete sets it only needs mul¬ 
tiplications and scalar products of diagonal matrices with the matrix discretizing the Gibbs 
measure (cf. |H] for further details). 

5.3 Infimal Convolution of Bregman Distances 

Inhmal convolution of convex functionals become popular recently in image processing in order 
to combine favourable properties of certain regularization functionals, e.g. total variation and 
higher-order versions thereof (REFs). A quite unexplored topic is the infimal convolution of 
Bregman distances however. Since they are convex functionals of the first variable one may 
consider the inhmal convolution 

[D^j/{-,ui)nD^j/{-,U2)]iu) = mCD^j/{u-v,ui) + D^j/{v,U2)], (5.13) 

vGX 

with an obvious extension to more than two values. 

Of particular interest in imaging applications appears to be the case of p 2 = —pi and 
U 2 = —ui for a one-homogeneous functional such as total variation. The latter was used to 


22 


obtain a regularization functional enforcing partly equal edge sets (REF colorbregman). While 
minimizing the Bregman distance for total variation strongly favours edge sets with jumps 
of equal sign (see also the discussion related to orientation for one-homogeneous functionals 
in Section 12 .4p , the infimal convolution of Bregman distances eliminates this part and hence 
measures differences in edge sets rather than jumps of the same sign. A further study of 
theoretical properties as well as applications of such kind of infimal convolution of Bregman 
distances remains an interesting property for future research. One obvious candidate are 
problems in compressed sensing where one is first of all aims at obtaining the correct support 
of the solution rather than the sign. 
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